Cost-effectiveness of HPV vaccination in 195 countries: A meta-regression analysis

Cost-effectiveness analysis (CEA) is a well-known, but resource intensive, method for comparing the costs and health outcomes of health interventions. To build on available evidence, researchers are developing methods to transfer CEA across settings; previous methods do not use all available results nor quantify differences across settings. We conducted a meta-regression analysis of published CEAs of human papillomavirus (HPV) vaccination to quantify the effects of factors at the country, intervention, and method-level, and predict incremental cost-effectiveness ratios (ICERs) for HPV vaccination in 195 countries. We used 613 ICERs reported in 75 studies from the Tufts University’s Cost-Effectiveness Analysis (CEA) Registry and the Global Health CEA Registry, and extracted an additional 1,215 one-way sensitivity analyses. A five-stage, mixed-effects meta-regression framework was used to predict country-specific ICERs. The probability that HPV vaccination is cost-saving in each country was predicted using a logistic regression model. Covariates for both models included methods and intervention characteristics, and each country’s cervical cancer burden and gross domestic product per capita. ICERs are positively related to vaccine cost, and negatively related to cervical cancer burden. The mean predicted ICER for HPV vaccination is 2017 US$4,217 per DALY averted (95% uncertainty interval (UI): US$773–13,448) globally, and below US$800 per DALY averted in 64 countries. Predicted ICERs are lowest in Sub-Saharan Africa and South Asia, with a population-weighted mean ICER across 46 countries of US$706 per DALY averted (95% UI: $130–2,245), and across five countries of US$489 per DALY averted (95% UI: $90–1,557), respectively. Meta-regression analyses can be conducted on CEA, where one-way sensitivity analyses are used to quantify the effects of factors at the intervention and method-level. Building on all published results, our predictions support introducing and expanding HPV vaccination, especially in countries that are eligible for subsidized vaccines from GAVI, the Vaccine Alliance, and Pan American Health Organization.

In the third stage, we use the nonlinear response curve estimated in the second stage to select potential bias covariates, using a generalized Lasso approach for linear mixed effects models, detailed in Section S3.3.
In the fourth stage, as described in Section S3.4, we use 10-fold cross-validation to select the standard deviation of a Gaussian prior to apply to all covariates other than those analyzed in the first stage.
In the fifth stage, we include covariates that were detected in the third stage, along with the nonlinear response, and consider a mixed effects model with a random intercept, as discussed in Section S3.5.

S3.1. Crosswalk analyses of sensitivity analysis covariates
Univariate sensitivity analyses are used to estimate the effect of a variable on the ICER with all other variables held constant by definition. We analyzed the difference in log-ICERs between sensitivity analyses and the corresponding reference analyses using models which we refer to as crosswalks. Including the results of these models as priors in subsequent steps of the analysis decreases omitted variable bias by giving more influence to pairs of ICERs which we know differ in no unmeasured variables. It also stabilizes estimates in the presence of multicollinearity. See the correlation matrix in Table S3.1 below.
We conducted crosswalk analyses for four variables, vaccine cost, cost discount rate, DALY/QALY discount rate, and coverage. These covariates had a sufficient number of sensitivity analyses for us to control for study-specific variables not included in our model by using comparisons of sensitivity analyses and main results. For each of these covariates, we paired each sensitivity analysis with another ICER from the same study and location and which differed only in that covariate. For these covariates, we fit separate models of the form

S3.2. Estimation of Nonlinear log-GDP per capita response curve
The relationship between log-ICER and log-GDP per capita is modeled using a basis spline (B-spline) [3,4]. In this section, we present B-splines, specification of constraints, discussion of trimming, and summary of spline ensembles. Portions of Section S3.2 have been reproduced or adapted from GBD Risk Factor Collaborators [6].

S3.2.1. B-splines linear tails
A spline basis is a set of piecewise polynomial functions with designated degree and domain. If we denote polynomial order by p, and the number of knots by k, we need p + k basis elements s p j , which can be generated recursively as illustrated in Figure S3.1. Given such a basis, we can represent any curvilinear relationship as the linear combination of the spline basis elements, with coefficients β ∈ R p+k : An explicit representation of (2) is obtained by building a design matrix X. Given a set of t values at which we have data, the jth column of X is given by the expression For extreme values of log-GDP per capita with little data, we need the capability to ensure that the outermost segments of the spline are linear, with slopes that match the adjacent segment at the knot. Splines with linear tails are often called natural splines.

S3.2.2. Robust Trimming Strategy
To robustify the approach against outliers, we use the trimming strategy, as discussed in [1,11]. The estimator min where f i is as in (2) and B encodes all necessary constraints is extended to the 'trimmed' estimator min w∈∆, β∈B i where each w i is required to be between 0 and 1, and the total mass of w is constrained to equal 90% of the data volume. Specifically, this means that where N is the total number of data points across studies. Thus the trimmed estimator finds the 90% most fittable data and fits them for β. Selecting a proportion of trimming has had a long history in terms of theory [9] and recent methodological innovations [1]. However, thus far it has not been possible to automatically select the expected number of inliers. We chose 90% in order to include the vast majority of the data while remaining robust to a potential set of outliers. The same choice has been made in larger systematic analyses [6] as well. Alternative proportions are available, but we did not experiment with them.

S3.2.3. Spline ensemble
Every model estimate intrinsically depends on the choice of knot placement used to generate the spline.
To remove the effect of this choice on the estimates, we develop an ensemble over this knot placement, leaving only the choice of spline degree and number of knots as modeling choices.
Given the degree and number of knots, we automatically sample a set of knot placements for a feasible knot distribution. For each resulting knot placement, we fit a spline (using the trimming estimator) and then evaluate each resulting model by computing its fit and curvature, aggregating the final model as a weighted combination of the ensemble.

S3.2.4. Sampling Knots From Simplex
To establish a reasonable feasible set from which to sample, we prefix a minimal set of the rules for the knot-placement and uniformly sample from this feasible set. Given a number of knots, the rules specify feasible ranges for each knot, and feasible gaps between knots. Specifically, given an interval [t 0 , t k ] delimited by terminal knots (which are always the minimum and maximum of the data), the feasible region of the interior knots t 1 , . . . , t k−1 is given by We enforce the rules The knot placement that satisfy these four rules comprise a closed polyhedron {t : Pt ≤ p}, where, We calculate the vertices of the polyhedron using the double description method in [7], and uniformly sample knot-placements from within the polyhedron. Each knot placement yields a model, fit using the trimmed constrained spline approach described above.

S3.2.5. Scoring
Once the ensemble is created, we score the resulting risk curves using two criteria: model fit (measured using the log-likelihood) and total variation (measured using the highest order derivative). These scores balance competing objectives of fit and generalizability. Once we have these scores, denoted as s 1 and s 2 , we normalize them to the range [0, 1]: and apply a logistic transformation. The transformation is used to make the scoring meaningful even in the presence of spurious curves in a large ensemble. We then multiply the scores 2 . to down-weight models that are low under either criterion (fit or total variation). The final weights are normalized to sum to 1.

S3.2.6. New nonlinear signal covariate
We fit a model of log-ICER on log-GDP per capita using a robust spine ensemble on log-GDP per capita with degree 2, two knots, and linear tails. This model also includes as covariates log cervical cancer DALYs per capita and the four crosswalk covariates. We placed Gaussian priors with meansα c and standard deviation SE[α c ] on the crosswalk covariates' coefficients. We used this model to generate a nonlinear log-GDP per capita response curve, which is encoded into a new nonlinear covariate called 'signal' and included in subsequent stages of the analysis. The shape of this transformation is displayed in Figure S3.3.
In particular, this allows us to fit a linear mixed effects model of the form where ij ∼ N (0, σ 2 ij ) are known each observation, and u i ∼ N (0, γ) is a random study-specific intercept, with unknown variance γ.

S3.3. Covariate selection using Lasso
Additional covariates are selected using a Lasso stategy described below in the context of linear mixed effects models [2] [8]. In considering potential covariates, we enforce that every categorical covariate has some variation; in particular every indicator covariate has at least two studies in each category. • We iteratively decrease the weight on the Lasso regularizer and let coefficients of bias-covariates enter the model in the order derived from the Lasso solutions.
• As a group of coefficients enters the model, we test it for statistical significance.
-If the coefficients are significant, we compute their posterior distribution and use this posterior as the prior for these coefficients for the next round.
-If the coefficients are not significant, the process terminates, and we return the list of (significant) covariates obtained so far.
We included the signal and the four crosswalk covariates as pre-selected covariates without the Lasso regularizer in all models above. We added Gaussian priors to the coefficients of the crosswalk covariates with meanα c and standard deviation SE[α c ], as estimated in Section S3.1.
Covariates with low variance or that are highly correlated with others are unlikely to be selected by this process, since including them would likely inflate the variance of the resulting estimators by an amount that outweighs the reduction in bias. This is a limitation of the current data set, and future work to expand the data set by extracting sensitivity analyses for a wider number of covariates could allow for the stable estimation of additional parameters.
There is ongoing methodological work to improve variable selection in the presence of collinearity. Based on early work showing the advantages of bridge regression vs. lasso [5] in the presence of correlation, the elastic net penalty [12] has been used, and in principle able to find groups of correlated predictors. Practical use requires additional parameter selection. We are currently looking into methods based on nonconvex regularizes as well [10]. These innovations can further improve variable selection in future work, but now we test for collinearity using basic tests before the lasso procedure starts.
One of the difficult questions in any variable selection procedure is when to stop. The methodology in step 1 builds on the Lasso methodology, but has an automatic termination criteria, stopping as soon as sequentially selected variables (selected across a range of the Lasso parameter) cease to be statistically significant in a standard Gaussian analytical framework.
Bias covariates that pass the selection process are included in the next stage of the model fitting.

S3.4. Gaussian prior cross validation
In order to further safeguard against overfitting, we included a Gaussian prior on all covariates. We used 10-fold cross validation to determine the prior standard deviation, τ cv , to apply to the coefficients of all covariates other than the four crosswalk covariates. We fit models of the same form as (4) with priors on coefficients β. For the four crosswalk covariates, we used the priors calculated in Section S3.1. For all others, we used a common N (0, τ 2 cv ) prior on their coefficients after standardizing the covariates to have mean 0 and unit variance. We used a grid-search to select the value of τ cv , that minimizes the MSE for predicting data in the hold-out set.
Parameters β and γ are obtained using maximum likelihood, as detailed in [11]. Standard errors of β are estimated by taking the standard deviation across 1000 samples from the posterior distribution of β.